Skip to content

feat: report absolute eval time in nanoseconds instead of relative to reference#1383

Closed
ArpanC6 wants to merge 4 commits intoTuringLang:mainfrom
ArpanC6:improve/benchmark-absolute-time
Closed

feat: report absolute eval time in nanoseconds instead of relative to reference#1383
ArpanC6 wants to merge 4 commits intoTuringLang:mainfrom
ArpanC6:improve/benchmark-absolute-time

Conversation

@ArpanC6
Copy link
Copy Markdown
Contributor

@ArpanC6 ArpanC6 commented May 3, 2026

Closes #1374 (partially)

What was the problem?

The benchmark reported t(eval) as a ratio relative to an arbitrary
reference function (simple_assume_observe_non_model), which made
results hard to interpret and added unnecessary noise.

What did I fix?

  • Removed the reference time baseline
  • Now reporting t(eval) in absolute nanoseconds
  • t(grad)/t(eval) ratio is kept as-is
  • Updated column headers accordingly

Note

Introducing Stan as an external baseline is a separate more involved
task left for a future PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.26%. Comparing base (2691e7c) to head (0049bee).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1383   +/-   ##
=======================================
  Coverage   82.26%   82.26%           
=======================================
  Files          50       50           
  Lines        3535     3535           
=======================================
  Hits         2908     2908           
  Misses        627      627           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ArpanC6
Copy link
Copy Markdown
Contributor Author

ArpanC6 commented May 3, 2026

Hlw @yebai, this PR partially addresses #1374.

I've gone through the issue carefully and made the following changes

  • Removed the arbitrary reference time baseline (simple_assume_observe_non_model)
  • t(eval) is now reported in absolute nanoseconds, which is more meaningful and interpretable
  • t(grad)/t(eval) ratio is kept as-is since it was already useful
  • Updated column headers accordingly

I also noted @penelopeysm's point that introducing an external baseline doesn't simplify comparative benchmarks you still need two checkouts regardless. So I've kept this PR focused on the absolute time reporting for now.

Note: The Benchmarking / combine results check is failing but this appears to be a pre existing CI issue unrelated to this PR.

Introducing Stan as an external baseline would be a separate more involved task happy to work on that as a follow up PR.

@ArpanC6 ArpanC6 force-pushed the improve/benchmark-absolute-time branch from 90f7d6b to 03079d5 Compare May 5, 2026 02:04
@ArpanC6 ArpanC6 closed this May 5, 2026
@ArpanC6
Copy link
Copy Markdown
Contributor Author

ArpanC6 commented May 5, 2026

Closing my PR #1383 as the upstream has since merged a more comprehensive benchmark improvement in #1385. The absolute time reporting approach from my PR has been superseded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve DynamicPPL benchmarks

1 participant